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Abstract 

Consider  the  problem  of  value  iteration  for  solving  Markov  stochastic 
games.  One  simply  iterates  backwards,  via  a  Jacobi-like  procedure.  The 
convergence  of  the  Gauss-Seidel  form  of  this  procedure  is  shown  for  both 
the  discounted  and  ergodic  cost  problems,  under  appropriate  conditions, 
with  extensions  to  problems  where  one  stops  when  a  boundary  is  hit  or  if 
any  one  of  the  players  chooses  to  stop,  with  associated  costs.  Generally, 
the  Gauss-Seidel  procedure  accelerates  convergence. 
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1  Introduction 

We  consider  two-player,  zero-sum,  finite-state,  Markov  stochastic  games.  There 
are  N  states  and,  unless  noted  otherwise,  we  suppose  that  the  controls  are 
feedback  and  not  randomized.  In  state  i,  player  I’s  (the  minimizing  player) 
control  is  denoted  by  Ui  and  that  of  player  2  (the  maximizing  player)  is  denoted 
by  Vi-  The  convergence  of  the  value  iteration  procedure  (see  (2.2)  below)  for 
Markov  stochastic  games  for  a  discounted  cost  function  (or  where  there  is  an 
absorbing  boundary)  was  established  in  [7,  11].  The  convergence  of  the  Gauss- 
Seidel  procedure  was  first  established  for  the  control  problem  in  [10].  It  is 
widely  used  and  is  generally  faster  than  the  Jacobi  procedure;  see,  for  example, 
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[5,  10,  9],  where  the  role  of  the  ordering  of  the  states  and  preferred  orderings 
are  discussed.  Indeed,  it  can  be  much  faster-depending  on  the  ordering  of  the 
states  in  the  iteration.  The  references  [5,  10]  discuss  the  nature  of  the  modified 
transition  probability  that  Q  represents  (for  the  control  problem)  and  shows 
why  it  is  faster. 

The  convergence  of  the  Gauss-Seidel  form  has  not  yet  been  established  for 
the  game  problem.  Under  appropriate  conditions,  the  convergence  will  be  estab¬ 
lished  for  the  discounted  and  ergodic  cost  functions,  and  for  related  problems 
such  as  where  there  is  an  absorbing  boundary  or  optional  stopping. 

The  Ui,  Vi  take  values  in  compact  sets  that  might  depend  on  i.  Define  the  con¬ 
trol  vectors  u  =  {ui,i  <  N},  v  =  {vi,i  <  N}.  Let  P{u,v)  =  {pij{ui,Vi)-,i,  j  < 
N}  denote  the  transition  probabilities  under  controls  u,  v.  For  notational  sim¬ 
plicity,  it  is  assumed,  when  dealing  with  the  discounted  problem,  that  the  dis¬ 
count  factor  p  G  (0, 1)  is  included  in  the  pij(ui,Vi).  Hence  P{u,v)  is  degenerate 
in  that  case:  the  row  sums  are  1  —  p.  The  cost  rate  when  in  state  i  and  under 
Ui,Vi  is  the  function  ki{ui,Vi).  Let  {Xn}  denote  the  random  variables  of  chain. 
Then  the  discounted  cost  under  u,  v  is 

OO 

C,{u,v)  = 

n—1 

where  denotes  the  expectation  under  u,  v  and  with  initial  state  i.  It  is 
always  supposed  that  the  Pij{ui,Vi)  and  ki{ui,Vi)  are  continuous  in  the  Ui,Vi. 
Define  the  vector  K{u,v)  =  {ki{ui,Vi)-,i  <  N}. 

In  addition,  unless  noted  otherwise,  we  assume  that  the  Isaacs  condition 
holds;  namely,  that  for  any  7V-vector  H  =  {hi,i  <  N}, 

sup  inf  [P{u,  v)H  +  K{u,  u)]  =  inf  sup  [P{u,  v)H  +  K(u,  u)] .  (1.1) 

y  U  U 

In  vector  forms  such  as  (1.1),  it  is  always  supposed  that  the  inf  and  sup  are 
taken  line  by  line,  so  that  the  tth  line  is  sup„.  inf„J^^  (ui,  -I-  ki{ui,Vi)] 
and  involves  the  inf  and  sup  over  Ui  and  Vi  only.  The  condition  (1.1)  is  used 
for  notational  simplicity.  Otherwise,  one  must  randomize  the  controls.  Then, 
when  the  number  of  control  values  is  finite,  the  control  is  replaced  by  the  vector 
of  probabilities,  and  the  analog  of  (1.1)  holds.  If  the  controls  take  values  in  a 
continuum  and  (1.1)  does  not  hold,  then  the  randomization  is  more  complicated, 
but  the  results  can  be  readily  extended.  The  condition  (1.1)  commonly  holds  for 
the  games  arising  as  numerical  approximations  to  stochastic  differential  games 
under  the  conditions  of  [4,  6]. 

Section  2  concerns  the  discounted  cost  problem  and  also  remarks  on  cases 
where  there  is  forced  stopping  on  hitting  a  boundary  or  with  optional  stopping. 
The  ergodic  cost  problem  is  dealt  with  in  Section  3  and  the  cost  under  u,  v  is 

1  " 

7(m,v)  =  lim-F;“’"^fcx,(Mx,,i’x,).  (1-2) 

n  n  ^ ^ 


2 


In  Section  3,  it  is  first  shown  that  the  game  version  of  a  classical  value  iteration 
method  converges.  Then  this  is  adapted  to  the  Gauss-Seidel  procedure.  If  con¬ 
trols  u",  v"  are  used  at  time  n,  then  write  •  •  • ;  for 

the  n-step  transition  probabilities.  The  ergodic  cost  problem  uses  the  additional 
assumption  that  there  is  an  e  >  0,  a  state  jo,  and  an  integer  m  <  N ,  such  that 

pSH*™,  (1.3) 

for  all  possible  controls.  This  is  a  standard  condition  for  the  ergodic  cost  prob¬ 
lem  in  the  control  literature  [12].  See  also  [1,  Vol  2]  and  [5,  ppl56-158].  Of 
particular  interest  are  Markov  chain  games  that  arise  as  numerical  approxima¬ 
tions  of  games  with  diffusion  models  as  in  [4,  6],  where  (1,3)  will  commonly  hold 
under  the  assumptions  on  the  nondegeneracy  of  the  diffusion  in  [4]. 

To  date,  there  have  not  been  proofs  of  the  convergence  of  the  Gauss-Seidel 
method  for  either  the  game  or  the  control  problem  with  ergodic  cost  criteria. 
Indeed,  it  does  not  always  converge,  even  under  (1.3)  for  ergodic  models.  But, 
it  will  converge  if  (1.3)  holds  for  a  modified  transition  probability.  This  will 
be  discussed  further  in  Section  3.  The  modified  condition  holds  for  the  chains 
obtained  as  approximations  in  [4]  under  the  nondegeneracy  conditions  used 
there.  These  chains  are  obtained  via  the  Markov  chain  approximation  methods 
of  [9].  The  book  [2]  discusses  other  numerical  procedures,  based  on  nonlinear 
programming  methods.  The  paper  [3]  discusses  what  might  be  called  a  type  of 
combined  value  iteration  and  approximation  in  policy  space  method. 


2  The  Discounted  Cost  Problem  and  Extensions 

Until  further  notice,  we  consider  the  discounted  cost  case.  Let  Ci  denote  the 
value  of  the  game  when  starting  in  state  i  and  define  C  =  {Cp,  i  <  N}.  In  vector 
form,  the  equation  for  the  value  is 

C  =  supini  [P{u,v)C  +  K{u,v)]  =  inf  sup  [P(m,  z;)^ -I- iL(u,  ri)]  .  (2.1) 

y  U  U  y 

In  all  such  vector  equations.  The  inf  sup  is  taken  by  line;  the  ith  line  is  over 
Ui,Vi.  Recall  that  the  discounting  is  incorporated  into  the  P{u,v),  Hence,  for 
any  integer  m  >  1,  D™;  •  •  • ;  D^)  is  a  contraction  (in  the  Euclidean 

norm  sense)  uniformly  in  the  choices  of  the  controls  {ft",  D”}.  A  unique  solution 
C  exists  and  is  the  value  [2,  Theorem  3.1.1].  Let  u,v  denote  any  controls  that 
realize  (2.1). 

Our  aim  is  the  computation  of  C,  hence  of  optimizing  controls  as  well.  A 
variety  of  computational  methods  are  available.  In  [7,  8,  11]  it  was  shown  that, 
for  any  (7°,  the  C"  in  the  iteration  in  value  space  algorithm 

C"+i  =supinf[P(z(,u)C'”-kA:(M,v)]  (2.2) 

„  u 


converge  to  C. 
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The  Gauss-Seidel  procedure  for  the  game  problem  is  the  iteration  in  value 
space  with  successive  substitutions,  taken  in  the  order  i  =  1,2,..., 


(7"+ 1  =  sup  inf 

Vi 


i-1  N 

Pij{^i^^i)^j  “t“  Alj  ("U  j ,  Uj  ) 

i  =  l  3=i 


(2.3) 


The  convergence  proof  in  Theorem  2.1  adapts  the  method  of  [5,  10].  The  order¬ 
ing  of  the  states  can  vary  with  n. 

Before  proceeding,  it  is  convenient  to  define  a  transition  probability  and  cost 
vector  that  appears  in  the  analysis.  Consider  the  set  of  linear  equations,  where 
the  vector  C  is  given,  solved  by  successive  substitution  in  the  order  i  =  1,2,...  : 


i-1  N 

1  =  1  j=i 


l<i<  N.  (2.4) 


This  uniquely  defines  a  matrix  (5(t6,  t>)  =  {qij{u,v);i,  j  <  N}  and  vector  K{u,v)  = 
{ki{u,  v);  i  <  N}  such  that  D  =  Q{u,  v)C+K(u,  v).  In  detail,  by  successive  sub¬ 
stitutions  in  (2.4),  we  find  that 

qij{u,v)  =  pij{ui,vi),  l<j<N, 
q2i{u,v)  =  P2i{u2,V2)qii{u,v), 

q2j{u,v)  =  P2j{u2,V2)  +P2l{u2,V2)qij{u,v),  2<  j  <  N. 


In  general, 


i-1 

qij{u,v)  =  Pij{ui,Vi)  +  '^Pik{u^,Vi)qkj{u,v),  j  >  i, 

(2.5) 

qij{u,v)  =  '^Pik{ui,Vi)qkj{u,v),  I  <  j  <  i. 

fc=i 

Q{u,  v)  can  also  be  defined  from  (2.4)  in  terms  of  the  upper  and  lower  triangular 
matrices  formed  from  P{u,v),  but  we  prefer  to  write  the  details.  Also, 

ki{u,v)  =  ki{ui,vi), 

k2{u,v)  =  P2l{u2jV2)ki{u,v)  +  k2{u2,V2), 


and,  in  general. 


i-1 

h{u,v)  =  '^pij{ui,Vi)kj{u,v)  +  ki{ui,Vi).  (2.6) 

k=l 

Note  that,  for  the  discounted  cost  problem  where  the  discount  factor  is  incorpo¬ 
rated  into  the  Pij{ui,Vi),  Q{u,  v)  is  a  degenerate  transition  matrix  since  the  row 
sums  satisfy  <lij{u,v)  <  p  for  all  i  and  controls.  If  there  is  no  discounting. 
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then  the  row  sums  are  always  unity.  These  facts  are  easily  proved  by  induction, 
starting  with  i  =  1. 

Theorem  2.1.  For  any  (7°,  the  C"  in  (2.3)  converges  to  C. 


Proof.  Since 


Ci  =  sup  inf 


N 


Pij  4"  j  -\- 

i=i 


(2.7) 


by  successive  substitutions,  we  can  write  (2.1)  in  the  equivalent  form 


C  =  sup  inf 


Q{u,v)C  +  K{u,v)  =  Q{u,v)C  +  K{u,v).  (2.8) 


Similarly,  with  u",v"  realizing  (2.3),  the  following  is  equivalent  to  (2.3): 


(7”+!  =  sup  inf 


g(w,  u)(7”  +  k{u,  v)  =  [g(u”,  u”)(7”  +  i7(M”,  v^).  (2.9) 


In  (2.8)  and  (2.9),  it  is  understood  that  the  inf  and  sup  are  again  taken  line  by 
line,  in  the  order  i  =  1,2....  The  inf  and  sup  in  line  1  are  over  ui  and  Vi,  and 
in  turn,  that  in  line  i  are  over  Ui,Vi. 

For  any  u,  v,  and  i  =  1, . . .  ,N,  (2.1)  yields 

i-l  N 

T  'y  ',Pij  ^i(^z;^i) 

J  =  1 


<  Ci  =  sup  inf 


N 


^^Pij(pj'ij'^i^Cj  T  y  Pij  (^Uij  Vj^Cj  T  ki{uijVi^ 


1  =  1 


i-l 


N 


y  Pij(,^ij  ^i'jCj  T  y  '^Pij  (llj ;  '^i ) C j  ki(^Ui^Vi) 


1=1 

i-l 


N 


-  '^Pi3{'^ijVi)Cj  +  '^Pij{Ui,Vi)Cj  +  ki{Ui,Vi). 

1=1  j=i 

In  vector  notation,  this  can  be  written  as 

P{u,  v)C  +  K {u,  v)  <  C  <  P{u,  v)C  +  K{u,  v). 
It  can  also  be  written  as 

Q{u,  v)C  +  k {u,  v)  <  C  <  Q{u,  v)C  +  k{u,  v). 
For  any  u,v,  (2.3)  or,  equivalently,  (2.9)  yields 
g(u”,u)(7"  +  iF(u",u) 


(2.10) 


(2.11) 


<  (7"+i  =  sup  inf 


Q{u,v)C^  +  k{u,v)^  <  Q{u,v^)C^  +  k{u,v^). 

(2.12) 
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Selecting  {u,v)  =  in  (2.11)  and  {u,v)  =  {u,v)  in  (2.12)  yields 

Q(m”,  ii)  ((7"+!  -C)=  [q(m”,  v)C^+^  +  X(m”,  i;)j  -  \q{u^,  v)C  +  k{u^,  v) 

<  (7"+^  -  (7 

<  \q{u,v^)C^  +  k{u,v'^)]  -  [g(w,u")C'  +  i7(M,v") 

=  g(«,u")  (c^-c). 

(2.13) 

Iterating  yields 

[g(u”,  !;)•••  Q{u\v)]  (C-i  -C)<  (7”+i-g  <  [g(M,  «”)•••  Q{u,  u^)]  {C^  -  C) . 

(2.14) 

For  the  discounted  problem  the  row  sums  of  Q{u,  v)  satisfy  qij{u,  v)  <  p 
for  all  i  and  controls.  Hence  (7"  ^  C  as  n  ^  oo.  ■ 

Stopping  when  hitting  a  boundary  set.  Now,  we  allow  p  €  (0, 1],  so  that 
the  discounting  can  be  dropped  if  desired.  Suppose  that  the  process  stops  when 
a  boundary  set  is  hit  and  that  the  mean  time  to  reach  the  boundary  set  is 
bounded,  uniformly  in  the  controls  and  initial  condition.  Thus  we  can  suppose 
that  the  boundary  set  is  absorbing  and  has  zero  cost .  Without  loss  of  generality, 
let  0  denote  the  boundary  state.  Let  P{u,  v)  still  denote  the  matrix  of  transition 
probabilities  among  the  states  1, ...  ,7V  only.  With  C'={Ci;l<t<7V}  and 
(7"  =  {(7”;  1  <  t  <  TV}  the  products  on  either  side  of  (2.14)  go  to  zero.  Thus 
(7"  ^  C.  It  is  preferable  if  state  1  is  connected  to  the  boundary  and  the  states 
are  ordered  so  that  the  “mean  flow”  is  toward  the  boundary  as  one  goes  from 
the  lower  to  the  higher  numbered  states,  where  possible. 

Optional  stopping  problems.  As  in  the  above  paragraph,  let  p  G  (0,1]. 
Various  forms  of  optional  stopping  can  be  handled.  There  are  now  three  ways 
that  the  process  can  be  stopped.  One  is  by  hitting  a  predefined  stopping  set, 
denoted  by  state  0,  as  in  the  previous  paragraph.  Call  the  time  tq.  Otherwise, 
either  player  can  decide  to  have  the  game  stopped.  The  associated  times  are 
called  Ti  for  player  i.  After  stopping  for  whatever  reason,  the  state  goes  to 
absorbing  0,  with  zero  holding  cost  there.  The  P(u,v)  represents  the  transition 
probabilities  only  among  the  states  1, . . . ,  TV.  For  given  functions  gi{-),  the  cost 
is  now 

roAri  Ar2  — 1 

C\{u,v)  =  Ef'^  kx„{ux„,vxj 

n^O  ^  ' 

EEi  9l{^ri)l{Ti<T2,Ti<To}  P  92{Xt2)I{t2<Ti/\To}  ■ 

The  controls  Ui,  Vi  can  now  take  the  new  value  stop  as  well  as  the  original  values 
used  in  Theorem  2.1.  Let  ki{ui,Vi)  >  e  >  0  for  all  i,Ui,Vi  values  other  than 
the  value  stop,  and  suppose  that  gi{-)  yf  52(0  but  51(f)  >  52(f)-  Extend  the 
definition  of  the  fci(-)  to  include  the  control  value  stop,  by  writing  ki{stop,  Vi)  = 
51(f)  and  let  fci(Mi,stop)  =  52(f)  if  Ui  yf  stop  and  let  it  be  zero  otherwise.  Then 
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the  Gauss-Seidel  algorithm  can  be  written  as  (2.3).  We  have  g2{i)  <  G”  <  gi{i). 
Similarly  (2.1)  holds  and  g2{i)  <  Ci  <  gi{i). 

Let  (u",w”)  satisfy  (2.3)  and  {u,v)  satisfy  (2.1).  Due  to  the  positivity  of 
ki{ui,  Vi),  for  Ui  and/or  Vi  not  equal  to  stop,  if  player  1  uses  tt"  and  player  2  uses 
some  u"  at  time  n,  then  P{u^,  u”)  •  •  •  P{vP,  u^)  ^  0  and  (5(m",  u")  •  •  •  Q(u°,  v°)  — 
0  as  n  ^  oo  uniformly  in  the  {v”}  choices.  Analogously,  Q{u,  v”)  •  •  •  Q{u,  {i°)  ^ 
0  for  all  {v"}  choices.  Using  these  facts  and  following  the  logic  of  the  proof  of 
Theorem  2.1  yields  the  convergence  G"  ^  G  for  this  problem. 


3  The  Ergodic  Cost  Problem 

Now  P(u,v)  is  the  transition  matrix  for  a  controlled  Markov  chain  which  is 
ergodic  under  any  u,v.  We  adapt  the  procedure  of  [5,  ppl56-158],  originally 
due  to  White  [12].  Let  e  denote  the  iV-vector,  all  of  whose  components  are 
unity. 


A  Jacobi  procedure.  We  first  consider  the  analog  of  the  simple  backwards 
iteration  (Jacobi)  procedure  (2.2),  whose  convergence  for  the  game  with  ergodic 
payoffs  has  not  been  proved  to  date  in  the  literature.  For  arbitrary  W^,  define 
the  vectors  VF" ,  w”  recursively  by 

lU”  =  sup„  inf„  [P{u,  +  K{u,  u)] 

W”e, 

where  jo  is  defined  above  (1.3).  There  is  a  value  for  the  game  [2,  Section  5.2]. 
The  value  7  is  given  by 

IT  +  ye  =  sup  inf  [P{u,v)W  +  K{u,  u)]  .  (3.2) 

1)  It 


As  for  the  control  problem,  the  value  of  W  is  unique,  up  to  the  addition  of  a 
vector  with  constant  components,  and  the  value  of  7  is  unique.  An  alternative 
way  of  writing  (3.2)  is  as 


W  =  sup„  inf„  [P{u,  v)w  +  K{u,  u)] 
w  =  W  -  Wj^e. 


(3.2a) 


Theorem  3.1.  ru”  converges  to  the  value  7  of  the  game. 

Proof.  Recall  the  condition  (1.3)  and  the  definitions  of  m  and  jo  there.  Let 
■u”,u”  be  the  selected  controls  in  (3.1).  Define  Cn  =  Then,  for  any  u,v, 

P{u^,v)w'^-'^  +K{u^,v) 

<  lU"  =  P(m”,  u")u;”-i  +  K{u^,  u")  <  P{u,  +  K{u,  u”). 

(3.3) 


7 


Let  {u,v)  =  (u"  in  (3.3)  and  use  the  definition  of  w"  in  (3.1)  to  get 


P(u",  z;"-i)w”-^  +  K{u^,  u”-i)  -  c„ 

<  w”  <  F(m"-\  n”)w”-i  +  iL(u”-\  ?;")  -  c„. 

Replacing  n  with  n  —  1  in  (3.3)  and  letting  {u,v)  =  (u",!)”)  yields,  for  i  <  N, 

P(w"-i,  u")ri;”-2  +  K{u^-\v^)  -  c„_i 

<  u>”-^  <  P(w",?;”-i)w”-2  +  iL(u",n"-i)  -  c„_i. 


The  last  two  inequalities  yield 

P(u”,t>"-i)  -  (c„  -  c„_i)  <w^- 

<  P{u^-\v^)  (rc"-!  -  -  (c„  -  c„_i) . 

Iterating  (3.4)  to  —  1  times  leads  to 


(3.4) 


P(u”,n”-i)  •  •  •P(M”-™+\n”-™)(u.”-™  -  zc"-™-i)  -  (c„  -  c„_„) 

<!(;"-  u>”-i 

<  P{u^-\v^)  ■  ■  ■  P(m"-™,z;"-™+1)(w;”-™  -  -  (c„  -  c„_^). 

(3.5) 

Define  =  ru"  —  w”  ^  ;  i  <  A^}.  Then  the  right  hand  inequality  of 

(3.5)  yields,  for  i  <  N, 


bwl< 


u  :u  ,v 


1  5  ' 


-  [TLX  -  ■ 


(3.6) 


Since  w'^  =  0  for  all  n,  we  have  Sw'^  =  0.  This,  with  (3.6)  and  (from  (1.3)) 


•  •  • ;  u”-™, 

for  all  i,  n,  and  controls,  yields  for  i  <  N 


)  >  e  >  0 


max^<  <  (1  -  e)  max^w”"™  -  [TT”  -  WX”™]  • 

3 

Analogously,  using  the  fact  that  min^  w”  <  0  and  that 

•  •  • ;  u""™)  >  e 

for  all  i,n,  the  left  hand  inequality  of  (3.5)  yields,  for  i  <  N, 

min^<  >  (1  -  e)  min^w”-™  -  [TT”  -  WX”""]  • 

3 

Hence,  for  all  i, 


max  —  mm 
Li  i  J 


6w:  <  (1  -  e) 


max  —  mm 
Li  i  J 


swr"", 


(3.7) 


which  implies  that  w”  converges  to,  say,  w.  Hence  hh"  converges  to,  say,  W, 
and  the  limits  satisfy  (3.2a).  Hence,  (3.2)  holds  with  7  =  Wjg.  ■ 


The  Gauss-Seidel  procedure.  The  Gauss-Seidel  form  of  (3.1)  is,  in  order 

z  =  1,2,..., 

wt  = 


i-l  N 

'^Pij{ui,Vi)W^  +  '^Pij{ui,Vi)  +ki(ui,Vi)  , 

1=1  j=i 

(3.8) 

Recall  the  definition  of  Q{u,v)  and  K{u,v)  from  Section  2.  Then,  in  matrix 
notation,  (3.8)  can  be  written  as 


sup  inf 

11.-  «t 


IT”  =  sup„  inf  „ 


Q{u,  v)w' 


n—1 


K{u,  v) 


u>”  =  IT”  -  lT”e. 


(3.9) 


The  condition  (1.3)  is  no  longer  sufficient  for  convergence.  For  arbitrary  controls 
{m”,  D”},  let  ^("^(u”,  v”;  •  •  • ;  t)^)  denote  the  i,  jth  element  of  Q('u”,  u”)  •  •  •  Q{v},v^). 

We  now  require  the  additional  condition  that  there  are  e  >  0,  jO)  and  an  integer 
TO,  such  that  for  all  controls  {u”,{)”}  and  all  i, 


>  e  >  0. 


The  condition  is  discussed  below  the  theorem. 


(3.10) 


Theorem  3.2.  ru”  converges  to  the  value  7  of  the  game. 

Proof.  The  proof  is  just  an  adaptation  of  that  of  Theorem  3.1,  analogously  to 
the  way  that  the  proof  of  Theorem  2.1  is  an  adaptation  of  the  proof  of  the  con¬ 
vergence  of  the  classical  procedure  (2.2)  of  value  iteration  for  the  discounted  cost 
problem.  Let  u”,v”  be  the  selected  values  in  (3.8)  or  (3.9).  Then  the  inequal¬ 
ities  (3.3)  hold  with  {Q,K)  replacing  {P,K).  Analogously  to  the  development 
in  Theorem  3.1,  this  and  (3.10)  imply  (3.7)  and  the  theorem.  ■ 

Discussion  of  (3.10).  Consider  a  one  dimensional  reflected  diffusion  on  the 
finite  interval  [A,B],B  >  A,  and  let  the  variance  be  strictly  positive.  Approx¬ 
imate  this  by  an  iV-dimensional  Markov  chain  via  the  methods  of  [9].  The 
reflecting  states  are  1  and  N,  which  correspond  to  A  and  B,  resp.  If  the  dis¬ 
cretization  interval  is  small  enough,  then  each  state  communicates  with  its  im¬ 
mediate  neighbors  only,  with  probabilities  that  are  bounded  away  from  zero, 
uniformly  in  the  controls.  Then  inf„^„^j  972  (w,  u)  >  0  and  we  can  use  any  to  >  1 
and  jo  =  2  in  (3.10).  This  is  a  consequence  of  the  form  of  the  Gauss-Seidel  iter¬ 
ation,  which  connects  states  to  those  that  are  lower  in  the  order  of  the  iteration. 
An  analogous  result  holds  for  the  multidimensional  case,  if  the  diffusion  being 
approximated  is  non-degenerate.  See  [9]  for  details  concerning  the  approxima¬ 
tion,  which  is  the  same  for  the  game  problem. 
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